Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition

Identifieur interne : 000345 ( Main/Exploration ); précédent : 000344; suivant : 000346

Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition

Auteurs : Slim Kanoun [Tunisie] ; Adel Alimi [Tunisie] ; Yves Lecourtier [France]

Source :

RBID : Hal:hal-00591943

Abstract

In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition</title>
<author>
<name sortKey="Kanoun, Slim" sort="Kanoun, Slim" uniqKey="Kanoun S" first="Slim" last="Kanoun">Slim Kanoun</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-210908" status="VALID">
<orgName>REsearch Group in Intelligent Machines</orgName>
<orgName type="acronym">REGIM</orgName>
<desc>
<address>
<country key="TN"></country>
</address>
<ref type="url">http://regim.org/</ref>
</desc>
<listRelation>
<relation active="#struct-301282" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-301282" type="direct">
<org type="institution" xml:id="struct-301282" status="VALID">
<orgName>École Nationale d'Ingénieurs de Sfax [Sfax]</orgName>
<orgName type="acronym">ENIS</orgName>
<desc>
<address>
<addrLine>Dépt. G.E, (ENIS), B.P. 1173, 3038 Sfax</addrLine>
<country key="TN"></country>
</address>
<ref type="url">http://www.enis.rnu.tn/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Tunisie</country>
</affiliation>
</author>
<author>
<name sortKey="Alimi, Adel" sort="Alimi, Adel" uniqKey="Alimi A" first="Adel" last="Alimi">Adel Alimi</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-210908" status="VALID">
<orgName>REsearch Group in Intelligent Machines</orgName>
<orgName type="acronym">REGIM</orgName>
<desc>
<address>
<country key="TN"></country>
</address>
<ref type="url">http://regim.org/</ref>
</desc>
<listRelation>
<relation active="#struct-301282" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-301282" type="direct">
<org type="institution" xml:id="struct-301282" status="VALID">
<orgName>École Nationale d'Ingénieurs de Sfax [Sfax]</orgName>
<orgName type="acronym">ENIS</orgName>
<desc>
<address>
<addrLine>Dépt. G.E, (ENIS), B.P. 1173, 3038 Sfax</addrLine>
<country key="TN"></country>
</address>
<ref type="url">http://www.enis.rnu.tn/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Tunisie</country>
</affiliation>
</author>
<author>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-00591943</idno>
<idno type="halId">hal-00591943</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-00591943</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-00591943</idno>
<date when="2011-04">2011-04</date>
<idno type="wicri:Area/Hal/Corpus">000087</idno>
<idno type="wicri:Area/Hal/Curation">000087</idno>
<idno type="wicri:Area/Hal/Checkpoint">000087</idno>
<idno type="wicri:doubleKey">1083-4419:2011:Kanoun S:natural:language:morphology</idno>
<idno type="wicri:Area/Main/Merge">000350</idno>
<idno type="wicri:Area/Main/Curation">000345</idno>
<idno type="wicri:Area/Main/Exploration">000345</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition</title>
<author>
<name sortKey="Kanoun, Slim" sort="Kanoun, Slim" uniqKey="Kanoun S" first="Slim" last="Kanoun">Slim Kanoun</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-210908" status="VALID">
<orgName>REsearch Group in Intelligent Machines</orgName>
<orgName type="acronym">REGIM</orgName>
<desc>
<address>
<country key="TN"></country>
</address>
<ref type="url">http://regim.org/</ref>
</desc>
<listRelation>
<relation active="#struct-301282" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-301282" type="direct">
<org type="institution" xml:id="struct-301282" status="VALID">
<orgName>École Nationale d'Ingénieurs de Sfax [Sfax]</orgName>
<orgName type="acronym">ENIS</orgName>
<desc>
<address>
<addrLine>Dépt. G.E, (ENIS), B.P. 1173, 3038 Sfax</addrLine>
<country key="TN"></country>
</address>
<ref type="url">http://www.enis.rnu.tn/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Tunisie</country>
</affiliation>
</author>
<author>
<name sortKey="Alimi, Adel" sort="Alimi, Adel" uniqKey="Alimi A" first="Adel" last="Alimi">Adel Alimi</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-210908" status="VALID">
<orgName>REsearch Group in Intelligent Machines</orgName>
<orgName type="acronym">REGIM</orgName>
<desc>
<address>
<country key="TN"></country>
</address>
<ref type="url">http://regim.org/</ref>
</desc>
<listRelation>
<relation active="#struct-301282" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-301282" type="direct">
<org type="institution" xml:id="struct-301282" status="VALID">
<orgName>École Nationale d'Ingénieurs de Sfax [Sfax]</orgName>
<orgName type="acronym">ENIS</orgName>
<desc>
<address>
<addrLine>Dépt. G.E, (ENIS), B.P. 1173, 3038 Sfax</addrLine>
<country key="TN"></country>
</address>
<ref type="url">http://www.enis.rnu.tn/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Tunisie</country>
</affiliation>
</author>
<author>
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-23832" status="VALID">
<orgName>Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes</orgName>
<orgName type="acronym">LITIS</orgName>
<desc>
<address>
<addrLine>Avenue de l'Université UFR des Sciences et Techniques 76800 Saint-Etienne du Rouvray</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.litislab.eu</ref>
</desc>
<listRelation>
<relation active="#struct-300317" type="direct"></relation>
<relation name="EA4108" active="#struct-300318" type="direct"></relation>
<relation active="#struct-301288" type="direct"></relation>
<relation active="#struct-301232" type="indirect"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-300317" type="direct">
<org type="institution" xml:id="struct-300317" status="VALID">
<orgName>Université du Havre</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
<tutelle name="EA4108" active="#struct-300318" type="direct">
<org type="institution" xml:id="struct-300318" status="VALID">
<orgName>Université de Rouen</orgName>
<desc>
<address>
<addrLine> 1 rue Thomas Becket - 76821 Mont-Saint-Aignan</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-rouen.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-301288" type="direct">
<org type="department" xml:id="struct-301288" status="VALID">
<orgName>Institut National des Sciences Appliquées - Rouen</orgName>
<orgName type="acronym">INSA Rouen</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-301232" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-301232" type="indirect">
<org type="institution" xml:id="struct-301232" status="VALID">
<orgName>Institut National des Sciences Appliquées</orgName>
<orgName type="acronym">INSA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName>
<settlement type="city">Rouen</settlement>
<region type="region" nuts="2">Région Bourgogne</region>
</placeName>
<orgName type="university">Université de Rouen</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics</title>
<idno type="ISSN">1083-4419</idno>
<imprint>
<date type="datePub">2011-04</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
<li>Tunisie</li>
</country>
<region>
<li>Région Bourgogne</li>
</region>
<settlement>
<li>Rouen</li>
</settlement>
<orgName>
<li>Université de Rouen</li>
</orgName>
</list>
<tree>
<country name="Tunisie">
<noRegion>
<name sortKey="Kanoun, Slim" sort="Kanoun, Slim" uniqKey="Kanoun S" first="Slim" last="Kanoun">Slim Kanoun</name>
</noRegion>
<name sortKey="Alimi, Adel" sort="Alimi, Adel" uniqKey="Alimi A" first="Adel" last="Alimi">Adel Alimi</name>
</country>
<country name="France">
<region name="Région Bourgogne">
<name sortKey="Lecourtier, Yves" sort="Lecourtier, Yves" uniqKey="Lecourtier Y" first="Yves" last="Lecourtier">Yves Lecourtier</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000345 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000345 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:hal-00591943
   |texte=   Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024